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Multiplicative cascades are often used to represent the structure of multiscaling variables in many 
physical systems, specially turbulent flows. In processes of this kind, these variables can be under- 
stood as the result of a successive transfer in cascade from large to small scales. For a given signal, 
only its optimal wavelet basis can represent it in such a way that the cascade relation between scales 
becomes explicit, i.e., it is geometrically achieved at each point of the system. Finding such a basis 
is a data-demanding, highly-complex task. In this paper we propose a formalism that allows to find 
the optimal wavelet in an efficient, less data-demanding way. We confirm the appropriateness of this 
approach by analyzing the results on synthetic signals constructed with prescribed optimal bases. 
Although we show the validity of our approach constrained to given families of wavelets, it can be 
generalized for a continuous unconstrained search scheme. 
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I. INTRODUCTION 

Multiplicative processes giving rise to cascades are 
quite ubiquitous in Nature. Either as a real mech- 
anism or as an effective one, cascades spontaneously 
develop in many scale-free systems. For instance, in 
a three-dimensional flow under fully developed tur- 
bulence, the energy is transferred from large to small 
scales (where it is released) through a well defined cas- 
cade process. In analogy, for an arbitrary cascade, the 
transfer of information between different scales, mea- 
sured in terms of appropriate variables, reveals also 
an interesting hierarchical structure [Tj whose analy- 
sis provides useful information about some features of 
the system. In particular the distribution of scaling 
exponents can be determined from the study of the 
statistical properties of the cascade process [3] . 

Apart from some models [4j [5j [6] , there have been 
few attempts to characterize the structure of particu- 
lar signals in terms of local cascade descriptors. The 
advantage of such an approach is that one can extract 
geometrical information about the signal in contrast 
to standard methods where only global statistical in- 
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formation is available. Given a signal (or dataset), the 
key point is to find a representation basis where the 
cascade process can be expressed in a microcanonical 
form, in other words, to find an appropriate trans- 
formation in which the representation variables are 
precisely these local cascade descriptors. 

Wavelets are a standard analysis tool in signal pro- 
cessing [7J: wavelet projections are integral trans- 
forms that separate the relevant details of a signal 
at different scale levels, and since they are tuned to 
an adjustable scale, they are appropriate to analyze 
the multiscale behavior of cascade processes and to 
represent them. Most of the standard wavelets are 
able to accurately estimate the distribution of energy 
(or equivalent quantity) at each stage of the cascade, 
something that is very useful as a global descriptor. 
In addition, for a given system, there is a particular 
wavelet called optimal wavelet that also characterizes 
the dynamics at a local level, as it corresponds to the 
proper representation basis for cascade variables. The 
main advantage of optimal wavelet projections is that 
they can be expressed as products of successive cas- 
cade variables chosen along a branch of a dyadic tree. 
This representation is minimally redundant, as cas- 
cade variables are independent between consecutive 
cascade stages, and it defines a local effective dynam- 
ics that opens the way to new theoretical develop- 
ments and practical applications [U [TU] . 

An attempt to find the optimal wavelet of natural 
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images from a sample dataset has been reported in 
[TT] . The methodologies presented there are quite lim- 
ited, as they exploit particular symmetries of natural 
images, and the uncertainty in the so-obtained empir- 
ical optimal wavelet is rather large to allow fine de- 
velopments. In this paper we will present an improve- 
ment of the methodology presented in in order 
to derive the optimal wavelet of more general types of 
data with more precision. Our study is focused on the- 
oretical and methodological aspects of this problem, 
and is validated using synthesized data with known 
optimal wavelet. 

The paper has the following arrangement: The next 
section explains the concept of multiplicative cascade 
and how it is identified in real signals. In section[TTT]we 
mathematically formalize canonical and microcanon- 
ical cascades through the use of wavelet projections, 
and we also introduce the concept of optimal wavelet. 
In section |TV| we introduce a quantifier of the optimal- 
ity degree and discuss about optimization strategies. 
Then, we generate synthetic cascades and check their 
optimality, showing the results in section [V] Finally, 
in section [VI] we give our conclusions. 



II. PERSISTENCE IN SCALE INVARIANT 
SIGNALS 

Multiplicative cascades are present in many differ- 
ent systems, but they are not usually recognized as 
such. Usually, their presence is reported by means of 
indirect evidence about its effects on the properties 
of signals. One of the most commonly reported effect 
of multiplicative cascades is the persistence of feature 
detection across scales. The importance of persistence 
is that the detection of a feature at a coarse scale al- 
lows to infer the presence of the same feature at finer 
scales. This phenomenon is well known since the in- 
troduction of wavelet representation of signals, and it 
is first described by Mallat and co-workers [T2"l [13]. 
The optimal wavelet is the one that maximizes this 
inference capability. 

To understand what is the role of wavelet process- 
ing it is convenient to clarify what a multiresolution 
decomposition is. In a multiresolution decomposi- 
tion the signal can be represented as a combination 
of wavelet coefficients that can be arranged according 
progressive levels of resolution, from finer to coarser. 
This representation is just an algebraic change of ba- 
sis, so the multiresolution decomposition of a signal 
contain exactly the same information as the original 
signal, and we can pass from one to the other with a 
linear transformation and without any loss of informa- 
tion. In the case of ID signals a single wavelet can be 



used to fully represent the signal in a dyadic scheme; 
for 2D signals, we need three different wavelets that 
will expand three different pyramids of resolution lev- 
els. In a dyadic scheme, when we pass from one resolu- 
tion to the next coarser one the scale changes by a fac- 
tor two, i.e., the diameter of the wavelet at the coarser 
scale is exactly twice the diameter of the wavelet at 
the previous, finer scale. This implies that a wavelet 
coefficient obtained at the coarser scale affects an area 
that is twice larger in diameter than that of the finer 
scale; roughly speaking, a wavelet coefficient at the 
coarser scale covers the area of two wavelet coefficients 
at the finer scale in ID and the area of four wavelet 
coefficients at the finer scale in 2D. In section llTTl the 
concepts of wavelet basis and dyadic decomposition 
will be introduced in greater detail; see also [7j Q3] . 

In Figure [l] we show a typical example of edge per- 
sistence. In the left panel we present a CCD-recorded 
snapshot of the distribution of dye under the action 
of 2D turbulence; the image was obtained in a labora- 
tory experiment of dispersion of passive scalars under 
the action of direct enstrophy cascade (for details on 
the experiment see [THIHEIHI])- In the right panel we 
show a multiresolution decomposition of this image 
in a 2D separable wavelet basis, namely Haar basis. 
The multiresolution decomposition on the right panel 
of Figure [T] presents all the wavelet coefficients of the 
representation in a compact shape. A 2D multireso- 
lution basis requires three wavelets and hence there 
are three types of wavelet coefficients, which in this 
case can be labeled as horizontal (leftmost squares), 
vertical (those with a side on the bottom of the panel) 
and diagonal. 



FIG. 1: Left: Snapshot of dye distribution submitted 
to the action of 2D turbulence; see the text for details. 
Right: Multiresolution decomposition with Haar basis of 
the image on the left; each resolution level and orientation 
has been independently normalized to enhance details. 

In Figure [2] we present a detail of three consecutive 
resolutions of vertical coefficients extracted from Fig- 
ure [I] Notice that the multiresolution decomposition 
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is just a change of vectorial basis, so the wavelet co- 
efficients are algebraically independent. It is however 
obvious from Figure [2] that the coefficients do not take 
arbitrary values: the edges detected at coarser scales 
persist at the same location but with better resolution 
at the finer scales. This is the persistence of edges, 
and it is a consequence of the structure of the signal, 
which implies that on many real systems edges are 
multiscalc. 




FIG. 2: The three finer resolution levels of vertical wavelet 
coefficients, extracted from Figure [I] going from left to 
right we go from the coarser to the finest resolution. The 
three resolution levels are represented at the same size to 
help comparison. 

Edge persistence is a strong, relevant feature of 
physical signals, as it implies that the signal is highly 
redundant. It is precisely by means of the wavelet 
representation that this redundancy becomes evident. 
Persistence implies that we can predict to some ex- 
tent what is going to happen at the next resolution 
level from the wavelet coefficients of a given level. 
Some authors [3] Ei] have exploited this redundancy 
to devise algorithms for image compression. Partic- 
ularly, Simoncelli and co-workers have noticed that 
the mutual dependence between consecutive scales 
can be better highlighted using conditional histograms 
HH1 [HI [20]. The histograms of fine-scale (also 
called "child") coefficients conditioned by the value 
of the coarse-scale (also called "parent" ) coefficient at 
the same location have a clear tie-bow shape for any 
wavelet El HD] (we also observe the same behavior in 
Figure 13] top). This shape implies that the disper- 
sion of the child increases with the absolute value of 
the parent coefficient. This suggests that the child 
coefficient depends on its parent coefficient in a mul- 
tiplicative fashion. For that reason, the distribution 
of the logarithm of the child coefficient conditioned by 
a value of the logarithm of the parent coefficient ex- 
hibits a linear dependence [HIES] ( see also our Figure [3] 
bottom). The authors found that, depending on the 
wavelet, the range of validity of this linear dependence 
can be larger or smaller. 

More recently, Pottier et al. [TU] studied satellite 
images of surface chlorophyll concentration and found 



them to be persistent across scales. Although they 
used very different wavelet bases, for none of them the 
histogram of the logarithm of the child conditioned by 
the logarithm of the parent have a full linear range. As 
we will see later, a wavelet for which the conditioned 
histogram is fully linear is called optimal, in the same 
sense that the one introduced by Turiel and Parga in 
[TT] . Pottier et al. proposed a particular model to 
describe the child-parent dependency, valid for many 
different wavelets that are not the optimal one but are 
not too far from it anyway. We will call this model 
the linear model, and it reads as: 

«C = Vo "P + a (1) 

where ac stands for the child wavelet coefficient and 
ap stands for its parent (i.e., it is obtained at the 
immediately coarser scaler and at the same position). 
rjo and ao are random variables mutually independent, 
also independent from cvp . The authors observed that 
this model fits reasonably well the conditioned his- 
tograms for many different wavelet bases, although 
depending on the particular basis the amplitude of 
the variable cvo varies; for smaller cvo the linear range 
in the conditioned histogram is larger and the con- 
verse. Now a question reasonably raises: is there any 
particular choice of wavelet for which the amplitude 
of ao vanishes? This would be the optimal wavelet, in 
the same sense as in [9] [11] . 

The importance of finding such an optimal wavelet 
must be stressed. First, because with the aid of this 
wavelet the description of the mutual dependence be- 
tween parents and children can be simplified; in fact, 
«o — implies that the mutual information between 
cvp and ac is maximized. So, a coding scheme as 
the one proposed by [5] attains the highest quality 
and smallest coding cost with the use of this wavelet. 
Besides, using this wavelet basis the inference of the 
value of the coefficients is improved, what has an im- 
pact on the quality of reconstruction algorithms to fill 
data gaps (as in [10]) or on forecasting of time series. 
Finally, optimal wavelets can be used to derive im- 
proved models of multifractal systems (for instance, 
some variables under fully developed turbulence). 



III. TOWARDS AN OPTIMAL 
REPRESENTATION OF DATA 

A. Canonical cascades 

The paradigm of systems in which multiplicative 
cascades develop are scale-invariant systems with one 
or many fractal interfaces. In them, conveniently de- 
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signed intensive variables put in evidence a complex 
interplay between different scales. 

Let s(x) be a physical variable representing the sig- 
nal under study. To study the scale relations of the 
system, we will need a properly defined, intensive, 
scale-dependent functional T applied to the signal, 
T [s] (r, x) . This variable depends on the point x and 
a scope or scale parameter r that characterizes the 
range of influence of the functional. Typical examples 
of such a functional include the derivative at radius r, 
nonlinear measures based on the derivative or wavelet 
projections. 

The canonical approach to multiplicative cascades 
is a statistical approach. Hence, the object under 
study is the distribution of the variable T[s](r, •) for 
different values of the scale parameter r only, dis- 
regarding the localization x, i.e., considering all the 
points as statistically equivalent. That is why we will 
simply denote this variable as T r . The analysis of its 
distribution is achieved through its order-p moments; 
studying the moments is enough to completely define 
the distribution provided they do not diverge too fast 
with p |2T]. 

A multiscaling (also called multifractal) signal s is 
characterized by the power-law scaling in the order-p 
moments of the related variable T r , in the way: 



o(r Tp ) 



(2) 



Recall that the symbol o(r Tp ) means a contribution 
that is negligible compared to r Tp when r goes to zero. 
In fractal signals, the exponent t p is directly propor- 
tional to the moment order p and the proportionality 
constant is called singularity exponent or Hurst expo- 
nent. In multifractal signals pQ, the dependence of t p 
on p is nonlinear, a fact known as anomalous scaling. 
In [A] the connection between geometry and statistics 
of multifractal signals is discussed in greater detail. 

In order to separate the part of the statistics that 
has to do with changes in scale, two different scales 
r, L with r < L can be compared, so: 



(TZ) 



(3) 



which is valid at lowest order in the limit of small r 
and L. For some particular t p , this relation implies 
the existence of a variable r) K such that: 



(4) 



where k = r/L < 1. Notice that one of the conditions 
for the existence of this variable is the validity of the 
expansion above, which in turn depends on taking a 
scale ratio parameter k smaller than 1; for this rea- 
son we have taken the ratio of the smaller scale by 



the larger scale. Notice also that there is no general 
proof on the existence of rj K for an arbitrary r p ; it can 
however be assumed to exist if t p defines infinitely di- 
visible processes H21 |5S]. These cases cover 
many situations of interest, such as log-normal, log- 
Levy or log-Poisson processes. 

With the aid of the variable r/ K we can express 
eq. ([3| in a more elegant way, making the cascade 
relation explicit: 



Vr/L 



(5) 



with Vr/L and being mutually independent. Here 
the symbol = means that the equality holds distribu- 
tionally, i.e., p(T r ) = p(j\ r /L Tl). However, this re- 
lation does not necessarily hold pointwise, as we will 
explain in the following subsection. 

The introduction of eq. (|5 1 now allows to split the 
statistics of the scaling variable T r in two parts: one 
part, given by Vr/L, accounts for the properties of 
transformation under changes in scale, while the other 
part, given by T^, takes into account the behavior at 
a given reference scale L. Taking L as the largest pos- 
sible scale in the system, the distribution of all the 
variables T r at any arbitrary scale r can be referred 
to the fixed level once the process of change in 
scale, rj r /L, is known. 

We will call the r\ r /L cascade variables. Their distri- 
butions do not depend on the particular scales r and 
L they connect but only on the scale ratio k = r/L. If 
we now consider three scales r < r' < L and we apply 
eq. ^ to the three possible scale pairs it follows: 



Vr/L ~ Vr/r' Vr'/L 



(G) 



from which the name "cascade variable" becomes evi- 
dent: the variable relating scales r and L is equivalent 
to the product of the variables relating any two inter- 
mediate scales. If any intermediate scale is allowed, 
it follows that the cascade variables must have an in- 
finitely divisible distribution [22) [26] [27; . Another im- 
portant characteristic of the distribution of the cas- 
cade variables is that it is a property of the signal and 
does not depend on the particular functional T used 
to obtain them, i.e., any functional capable to resolve 
the scaling exponents t p of the signal in eq. ^ leads 
to exactly the same distribution of cascade variables 

Vr/L DP- 

B. Microcanonical cascade 

Equation ^ makes sense only as a distributional 
equality and does not imply that the functional of 
scale r at some point x is related to the functional of 
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scale L at the same point through an independent mul- 
tiplicative factor. In general, T[s](r, x) and T[s](L,x) 
are not related by a variable rj r /L(x) that is statis- 
tically independent of T[s](L,x). Of course, we can 
always define fj r / L (x) as the ratio of these two vari- 
ables, 



Vt/l(x) 



T[s](r,£) 



(7) 



but for most of the possible functionals T, the vari- 
ables fj r /i(x) are not independent of T[s](L,x) and 
thus they cannot be considered cascade variables, as 
they do not verify eq. (JfjJ. It is convenient to deal 
with cascade variables, as they are independent of the 
starting scale and only depend on the ratio of scales; 
this implies that they serve both to characterize the 
global properties of the system and to compactly cod- 
ify its dynamics. 

In many multifractal systems, the cascade process 
governs their dynamics as a local effective mechanism, 
what implies that there is a local variable f} r /i,{x) 
transferring energy, matter or information (depending 
on the system) from coarser to finer scales. Therefore, 
there may exist a system variable s and a scale-tunable 
functional T for which eq. ^ makes sense not only 
distributionally but also at any point x of the system. 
That is what we call microcanonical cascade. 

Among the functionals that are most commonly 
used to analyze the scaling properties of multifrac- 
tal systems, wavelets occupy a prominent position. 
Wavelets have been used to perform local Fourier anal- 
ysis and to characterize the local singularities of func- 
tions |28j . In many different multifractal systems, 
wavelet projections have been used to characterize 
their scaling properties with success [551 ISO]- Some- 
thing that is very convenient about wavelet projec- 
tions is that they can be inverted to retrieve the orig- 
inal signal [2], so wavelet projections do not only an- 
alyze the signal, but also constitute a representation 
of it. That is why wavelet projections are good can- 
didates to realize the microcanonical cascade. 

A wavelet is a function that oscillates in the center of 
its domain and decays in its tails; we can think about 
wavelets as a pulse that decays very fast. Let s(x) be 
a multifractal signal and let ^f(x) be a wavelet. We 
define the wavelet projection of s on \I> at the position 
x and the resolution scale r as: 



T*[s](r,£) = &ys(y)$> 



x-y 



(8) 



In terms of wavelet projections, a microcanonical cas- 
cade has the following form: 



Notice that the key point is that r/ r / L (x) has to be 
both a cascade variable -in the sense of eq. ([6|- and 
independent from T$[s](L, x). We can thus define the 
optimality of a wavelet as the degree of independence 
oif] r / L {x) vs. Ty{s](L,x); we will discuss this possi- 



bility in depth in Section IV There are evidences that 
such an optimal wavelet exists in natural images |llj 
and in marine turbulence |10j for the specific case of 
wavelet dyadic representations. 



C. Dyadic representations of the cascade 

Wavelet projections of a signal can be used to char- 
acterize the local properties of the signal [55] or to 
represent it in an efficient scheme [H]. Although a 
signal can be retrieved from its wavelet projections if 
the wavelet is admissible |14j . such a representation is 
highly redundant and so a subset of wavelet projec- 
tions must be retained. A typical way to subsample 
continuous wavelet projections is to select a dyadic 
subset (then called wavelet coefficients) from which 
the signal is fully reconstructed. In a dyadic subset, 
scales vary by a factor two and at each resolution level 
the positions are taken as integer amounts of the res- 
olution size. 

To keep formulae simple, hereafter we will limit our 
attention to one-dimensional systems. Hence, given 
a ID signal s(x) and a wavelet W capable to spawn 
a dyadic representation basis, the signal can be ex- 
panded as a series of wavelet terms: 



s(x) 



j=~oo k 



where 



k) 



(10) 



(11) 



and j, k are integer numbers. The coefficients of this 
representation basis, the ct^fc, are called wavelet coef- 
ficients. As iff defines a representation basis, there is 
a unique set of wavelet coefficients {ctj^k} such that 
eq. ( 10 1 is valid. The 2~- J / 2 normalization factor en- 
sures that the 2-norm is 1, Jdx \*ffj^(x)\ 2 
If the wavelet basis is orthonormal. i.e. 



= 1. 



(*i,fc||*i',fc') = J dx*ik(x)Vj>,k>(x) = S J:3 >Sk,k' 

(12) 

we can obtain the wavelet coefficients as projections 
on the wavelet basis, namely: 



T*[«](r,f) = r] r/L {x) T*[a](L,2) (9) 



«i,k = (*j.*||a> 



(13) 



() 



If the wavelet basis is not orthonormal, the exten- 
sion is rather straightforward: each basis vector ||^j,fc) 
has its dual (^j,fc|| so that (^j,k\\^j',k') — &k,k' 
and the wavelet coefficients can be obtained as (Xj t k = 

fell*)- 

In terms of a dyadic representation, the cascade 
takes a relatively simple form. For any wavelet ba- 
sis, the canonical cascade relation, eq. (pj, takes the 
following form: 



*j,k = % Q!j-X,Lfc/2j 



(14) 



where the notation [^/2J means the integer part of 
k/2. Here we have written the cascade relation mim- 
icking eqs. (|5| and (J9j, although that the wavelet coef- 
ficients Oj jt are not intensive variables as the wavelet 
projections are as defined in eq. ^ (while wavelet pro- 
jections are oo-norm normalized, wavelet coefficients 
are 2-norm normalized, which is highly convenient in 
the derivations to follow, especially in section |IV| . 
This means that the 77- like variables written hereafter 
will differ from those appearing in Equations Q to ^ 
in a constant normalization factor of \JrjL — . 

Notice that otjk is the wavelet projection at the 
scale rc — 2 J ' and position xq — 2 J fc, while atj+i ,[fc/2j 
is the wavelet projection at the coarser scale rp = 2 J+1 
and position xp = 2 3+1 [k/2\ ; the positions xq and xp 
differ at most by rc, which is the spatial uncertainty 
at the scale rp, so at the scale rp we can consider 
that xq and xp refer to the same position. To alle- 
viate the notation, for given fixed scale index j and 
position index fc, ap = ctj+i ,lfe/2j is known as the Par- 
ent coefficient, ac = otj,k is f ne Child coefficient and 
the cascade variable is r\ = r\ 1 , and we just write the 
canonical cascade relation above as: 



77 a P 



(15) 



A dyadic wavelet basis is said to be optimal if the 
associated wavelet coefficients verify the microcanon- 
ical cascade relation, namely: 



a c 



77 a P 



(16) 



where rj is independent of the parent wavelet coeffi- 
cient ap and is thus a cascade variable with associated 
scale ratio |. 

It has been shown that if the optimal wavelet exists, 
there is a constructive formula to unambiguously ob- 
tain it from a large enough dataset [IT] . This formula 
proves uniqueness of the wavelet, but it is rather un- 
stable, specially in estimating the tails of the wavelet, 
and it is not very useful unless a large amount of data 
is available as learning set [3TJ [32] . We will next an- 
alyze alternative strategies for a more stable determi- 
nation of the optimal wavelet. 



IV. OPTIMIZATION FROM SUBOPTIMAL 
REPRESENTATIONS 

A. Quadrature Mirror Filters 

Dyadic wavelet expansions can be used to describe 
the cascade with a discrete set of parameters. A par- 
ticularly, widely used way to implement dyadic rep- 
resentation is in terms of Quadrature Mirror Filters 
(QMF), which are very robust in practical applica- 
tions. The main advantage of QMFs is that they 
are discrete filters so both obtaining the wavelet co- 
efficients from a signal and reconstructing the signal 
from its wavelet coefficients are numerically exact op- 
erations (apart from round-off errors) . In the follow- 
ing, we will summarize the most relevant facts about 
QMFs; the interested reader can consult some wavelet 
textbooks [7lH"4]. 

When a function "J defines a wavelet basis, it is pos- 
sible to find another function <f>, called unity function, 
that is orthogonal to the wavelet but, contrarily to it, 
has non-zero mean. Another important property of 
unity functions is that they can be used to represent 
the approximation of the signal at a given scale. 

The approximation of a signal s(x) at a scale in- 
dexed as jo is given by an expansion of functions 
&j ,k whose coefficients are called approximation co- 
efficients. The signal can hence be expanded as a 
series of infinite levels j, as in eq. (10 1, or partially 



expanded up to a level j and approximated at this 
level. Namely, we can expand the signal s(x) as fol- 
lows: 



j=—ao k 
3a 

j=—oo k k 



A jQ (x) 

The approximation Aj (x) can be expressed either as 
a wavelet expansion of all the levels coarser than jo or 
as an expansion on unity functions at a single level jo . 
Hence, it is possible to obtain the wavelet coefficients 
from the approximation, as the wavelet projections of 
the approximation coincide with those of the signal at 
any level coarser than jo. It should be noticed that if 
the signal is discrete, it coincides with its approxima- 
tion at any level finer than that of the discretization 
scale. 

The main advantage of this new decomposition is 
that the approximations Aj [x) are numerable sums, 
so we can define two numerable filters, denoted by 
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{g n } and {h n }, that can be used to obtain the wavelet 
coefficients at any level provided that we know the 
approximation at the finest level, i.e., the signal at 
its discretization level. Then, applying the conjugate 
mirror filters {<?„} and {h n } we can both obtain the 
wavelet coefficients from the signal or retrieve the sig- 
nal from their coefficients with very fast algorithms, 
which are exact over discretized collections of coeffi- 
cients. 

When we expand the scaling function itself <& (i.e., 
^0,0) U P to the next coarser scale jo = 1, the fil- 
ter {g n }, which we will denote by the vector g = 
(. . .,g-!,g , gi, 52, ■ • ■), is given by the wavelet coef- 
ficients, while the filter {h n }, which we will denote 
by the vector h = (. . . , h—\, ho, hi, /12, ■ ■ is given by 
the approximation coefficients, namely: 

$(ar) = ^2g n ^l,n( x ) + ^2 h n^l,n( x ) ( 18 ) 
n n 

where the wavelet coefficients at any level finer than 
j = 1 are zero because the unity function coincides 
with itself at level j — 0. 

Let us now suppose that we have a discretized sig- 
nal Sk defined by a collection of values, which are nat- 
urally identified as the approximation coefficients at 
the highest resolution f3o t k — Sk- We will denote this 
collection of approximation coefficients by the vector 

/3q = (. . . ,/3o,-i, A),o, A>,i, A),2, • ■ •)■ Since we have 
previously said that r = 2 J , having the highest res- 
olution at level j = means that we are expressing r 
in units of pixels. To obtain the wavelet coefficients at 
the next coarser level j = 1 we apply the filter g. Let 
di be the vector of these wavelet coefficients, then we 
have: 



aik 



E 



<?n-2fc Po,n 



(19) 



that is, the filter g acts by convolution on the vector 
(3. For later convenience, let us introduce the matrix 
G that represents the action of g by convolution, i.e., 
G ran ' = 5n'-2n- We can now elegantly express eq. ( 19 1 
in vectorial form as: 



Oil 



(20) 



Notice that the expression above can be used to 
relate the approximation and the wavelet coefficients 
of any two consecutive resolution levels, i.e., 



a j+ i 



(21) 



but in order to obtain the wavelet coefficients at any 
other resolution we need an expression to obtain the 



coarser approximations derived from the highest re- 
solved one. This can be done by means of the filter h. 
Analogously to what has been derived previously, we 
have that two consecutive approximation levels can be 
related by the filter h as follows: 



(22) 



where M nn i = h n i-2n- We already have the essentials 
to perform a perturbative analysis on the wavelet. 

B. Perturbative analysis 

In general, most of the wavelet bases applied to the 
analysis of given data are not optimal. This means 
that the cascade does not hold in the microcanonical 



sense and so eq. ( 16 ) cannot be used. In the following 



we will show that when the wavelet basis is relatively 
close to the optimal basis, the linear model proposed 
by Pottier et al., eq. ([I}, is verified. Our proof is 
based on the QMF representation introduced in the 
previous subsection and it is focused on ID signals for 
simplicity. The generalization of higher dimensions is 
straightforward. 

First, let the optimal QMF be denoted by (g,h). 
At the discretization level j = 0, the signal corre- 
sponds to the vector /3p pt = (. . . , s_i, sq, si, . . .). Let 
us consider now the Child and the Parent scale levels 
as the two next coarser dyadic levels, namely j c = 1, 
rc = 2 pixels and jp = 2, rp = 4 pixels (notice that 
the wavelet coefficients at levels j < are all zero as 
discrete signals cannot vary inside their pixels, i.e., at 
levels finer than the discretization scale). This way, 
eq. (20 1 is notated as: 



S° c pt = G-/3° pt (23) 

The approximation to the next level is given by: 

/3° pt = H-/3° pt (24) 

from which the details at the coarser resolution (par- 
ent coefficients) can be deduced: 



-•opt 



G • /3 opt 
G • H ■ ^° pt 



(25) 



Owing to the fact that the QMF is optimal, at each 
location k we can find an independent cascade variable 
r\k such that: 



opt opt 

a C,k = Vka p[k/2} 



(26) 



If we define now the matrix N formed by these cascade 
variables disposed on the diagonal, namely: 



8 



[k/2]k' 



(27) 



we have that the cascade relation between children 
and parent coefficients can be written for the child 
and parent detail vectors as follows: 



-^opt 



N ■ a° P pt 



(28) 



Let us now introduce a small perturbation on the 
optimal QMF; we will define a new, suboptimal QMF 
(if, h') = (g + Sg, h + Sh) for small Sg and Sh. The 
new child detail vector will be given by: 



ac = 



-^opt 



00 



SG-/3 



N 



-n>Opt 



■SG-Po 



(29) 



Notice that we have made the assumption [3q = pQ Pt 
as both are identified with the signal itself at its dis- 
cretization scale. The next coarser approximation vec- 
tor is: 



pi = (H + OT)-/3b 
= p opt + SB-p 



(30) 



Finally, the details at the next coarser resolution up to 
the first perturbation order are given by the following 
vector: 



Olp 



(G + SG) ■ /? opt + G • <JH • A) 



-*opt 



+ G ■ SB) ■ /3q (31) 



Combining eq. (29 1 and eq. (31 1 we obtain 



a c = N-a P + [5G-N- (SG-B + G ■ SB)]-{3 (32) 
Defining now o*o as: 



N • (SG ■ B + G ■ SB)} ■ f3 Q (33) 



when substituted in eq. (|32j) we obtain the vector ver- 
sion of the linear model, eq. (JT|), introduced in |10j . 
namely: 



dc — N • dp 



(34) 



According to our derivation we can now make some 
remarks about the variables rj and ag appearing in 
the linear model. First, the variable ?/o is an actual 
cascade variable, distributed according to the same 
statistics, and up to the first order it is independent 
from the parent coefficient in the suboptimal basis. 
Second, the variable ao is much smaller than the term 



rjo ap and is only relevant for small values of ap . We 
cannot say much about the statistical distribution of 
ao, not even whether it is independent or not from 
the other term. However, it is reasonable to think 
that this variable is governed by the fluctuations due 
to the mixing of the different terms in the definition 
of «o (see eq. (33)) and the arbitrary character of the 
perturbations SG and SB. This fact allows to consider 
this variable independent from 770 ap , as the experi- 
ences in |10j confirm. 



C. Optimization strategies 

The results in the previous subsection show that the 
amplitude of ao (the optimality degree) varies contin- 
uously under perturbations on the wavelet. Hence, 
an optimization strategy based on successive correc- 
tions of the wavelet would lead to the actual optimal 
wavelet, provided that the initial guess is not too far 
away from the optimality. 

As seen in Section |III| all cascade variables 77 are 
equally distributed, independent of the wavelet basis 
from which they are derived, and their moments can 
be retrieved from r v . In addition, the expectation 
value of 1 77 1 is fixed due to translational invariance [11 
|2"9"] : (I77I) = c 2r d l 2 in an arbitrary dimension d; (\r]\) = 
for ID signals. According to the linear model, 

eq. ([IJ, the expectation value of |rjj| is: 



= (\Vo + aoap 1 !) 



(35) 



Let us explore the two asymptotic limits. If the 
wavelet is optimal then o?o = so: 



v) = {\m\) = M. 



(36) 



In the opposite case, for a highly non-optimal wavelet 
we will have that a^/ap S> 770 and taking ao indepen- 
dent of ap we would obtain that: 



(37) 



where q = dapDdapl" 1 ), which by Jensen's inequal- 
ity |33j is greater than one: q > 1, for any random 
variable ap. For an intermediate case, the preceding 
two regimes are combined. If p is the proportion of 
the range of values of ap for which 770 > ao/ ap and 
(1 — p) is its complementary, we roughly have that: 



(I77I) « p(\ V \) + (l-p)q(\r,\) 



(38) 



Hence, in any instance (I77I) > (|?7|) and (|?7|) = (|?7|) 
for the optimal wavelet only. We normalize this quan- 
tity to define the optimality degree Q as: 



Q 



(39) 
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which is Q > 1, and Q = 1 for the optimal wavelet 
only. Q is a monotonic function of the amplitude of 
ao (which in fact measures the deviation from the op- 
timal case), so that Q not only evidences the optimal 
wavelet case (when Q = 1) but it actually ranks sub- 
optimal wavelets by their respective deviation from 
optimality. 

An alternative approach would consist in analyzing 
the degree of independence between fj and ap. As 



stated in Section III B independence between these 
variables is an indicator of the optimality of the 
wavelet. This can be expected, as having Q > 1 im- 
plies correlation between f] and ap, and correlation 
implies statistical dependence. In this case decorrela- 
tion {Q — 1) implies independence also, as Q = 1 im- 
plies optimality and optimality implies independence. 
In fact, t) and ap are negatively correlated in subop- 
timal cases (Q > 1), and uncorrelated only for the 
optimal wavelet: 



Q= = 1 Cov(|fl,|ap|) 



\v\) 



<KI> 



(40) 



A standard measure of statistical dependence is the 
mutual information. Therefore, the mutual informa- 
tion between f\ and ap, I — I(rj,ap), could also mea- 
sure the degree of optimality of a wavelet. However, 
the advantage of using Q instead of I comes from the 
fact that Q is less numerically sensitive to sampling 
size than /. The main problem with the practical cal- 
culation of the mutual information is that it is very 
data demanding (see the estimation of uncertainties 
in [b] and the numerical study in the next section) . 
Hence, when only small and short datasets are avail- 
able, Q is more convenient as indicator of the opti- 
mality degree. 



V. RESULTS 

Now we want to show in practice the theoretical 
results given in the previous section, namely the va- 
lidity of the linear model, eq. (34 1, and the perfor- 
mance of our measures of optimality, Q and /. We 



have generated synthetic signals according to a given 
cascade process and with a prefixed optimal wavelet 
basis. The cascades are generated by first calculating 
the wavelet coefficients through eq. ( 16 1 for dyadic 



scale steps, and then generating the signal from these 
wavelet coefficients, eq. (10 1, with the chosen wavelet 



basis. The multiplicative variable r\ is a random vari- 
able following a given cascade distribution without 
horizontal correlations, i.e., it follows Benzi et al.'s 
model [5]. As distribution for the cascade variable 
r\ we have chosen the log-Poisson distribution, which 



has been proposed in many different physical systems 
[331 123 HH] • Hence, we have chosen a translationally 
invariant log-Poisson characterized by having a most 
singular manifold of dimension D x = and singular- 
ity exponent hoc = — h, which is a realistic choice of 
parameters [29j [34] . See the [A] for a description of the 
log-Poisson distribution and parameters. 

Regarding the linear model, it has been derived by 
perturbative analysis. In Figure [3j we validate this 
model in practice, for a very long series of 67 108 864 
points. Figure [3] top shows the probability density 
function of the child coefficient ac conditioned by a 
given value of the parent coefficient ap . In Figure [3] 
top left the analysis wavelet is the optimal wavelet 
and in Figure [3] top right the analysis wavelet is a 
suboptimal wavelet. First, we can observe that for 
any value of the parent coefficient, the child coefficient 
is symmetrically distributed p(ac|ap) = p{— ac|ap), 
what means that (rj ) = (a ) = 0; this also implies 
p(ac\ctp) = p(ac\ — ap). We also observe that the 
standard deviation of the child coefficient conditioned 
by a value of the parent coefficient depends hyperbol- 
ically on it, as predicted by the linear model, namely: 



»c lap 



a c |a P ) 



Aap 



B 



(41) 



where the constants A and B are given by the lin- 
ear model: (rjg) = A and (q.q) = B. For the opti- 
mal wavelet, A = (rj 2 ) and B = 0, so that t?o co- 
incides with rj. Additional evidence is furnished by 
the conditioned histograms of logarithms of the par- 
ent and child coefficients, i.e., the conditional prob- 
ability of ln|ac| for a given value of ln|ap|, which 
is shown in Figure [3] bottom. As before, the bot- 
tom left histogram corresponds to the optimal case 
while the bottom right histogram is a suboptimal 
case. The absolute values fold the top histograms 
to the first quadrant while the logarithms balance 
the kurtotic distributions of the wavelet coefficients. 
When the series is analyzed with its optimal wavelet, 
the histogram exhibits a perfectly straight maximum- 
probability line and small dispersion around this line. 
In contrast, when the series is analyzed with a sub- 
optimal wavelet the histogram bends on the left to 
a horizontal line. This bending is in agreement with 
the linear model, eq. (34 1, as the term a becomes 



dominant when ap is too small. The two asymptotic 
limits can be easily obtained from eq. ( 34 1 : when 
the value of the parent coefficient ap is large, in 



ln|ac| = ln|?7oap|+ln 



1 



770 a P 

becomes irrelevant, so that In |ac| 
When the value of the parent coefficient ap is small, 



the second term 
=i ln|a P | +ln|77o|. 



in In lac I = In |ao| 



In 



1 



ao 



the second term 
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rapidly becomes irrelevant, so that ln|ac| ~ hi|ao|- 
Not only the asymptotes, but also the central behavior 
is the one given by the model, as the line of maximum- 
probability of the histogram fits a shape: 

ln|ac|m. P . = ln(|a | m . p . + |ry |m.p. expln|a P |) (42) 

where m.p. stands for maximum probable, i.e., these 
values are the probability maxima of their respective 
distributions. In addition, the amplitude of the fluc- 
tuations of «o is larger than that of tjq. That is why 
the left side of the histogram shows large dispersion 
that is reduced as ln|ap| grows and tends to that of 
the optimal case in the right side. 




FIG. 3: Joint histograms of etc vs. ap (top) and ln|ac| 
vs. ln|ap| (bottom) for synthetic cascade data generated 
with the Coiflet-1 wavelet and analyzed with the Coiflet-1 
wavelet (left figures, the optimal case) and Battle-Lemarie- 
6 wavelet (right figures, a non-optimal case). Each his- 
togram column has been normalized so that it corresponds 
to the probability distribution function of the vertical-axis 
variable conditioned to a given horizontal-axis value. Val- 
ues range —0.125 to 0.125 in both axes (top figures) and 
—32 to 1 in both axes (bottom figures). The analyzed 
data are a single series of very high resolution (67 108 864 
points) and the histograms are defined by a grid of 25 x 25 
bins (top) and of 50 x 50 bins (bottom), smoothed with 
a cubic spline to en hanc e presentation. The generating 
cascade process, eq. ( |14[ ), is a log-Poisson of parameters 
Doc = and h x — — | (see [X| for a detailed description of 
the process). 

In a more extensive test, we have used 24 stan- 
dard wavelets of very different families. These are: 



Haar, Daubechies (orders 2 to 10), Coiflet (orders 1 
to 5), Symmlet (orders 4 to 8) and Battlc-Lcmaric 
(spline wavelets) (orders 1, 2, 3 and 6). Notice that 
Haar and Daubechies-1 coincide, while Symmlet 1 to 
3 also coincide with Daubechies 1 to 3 respectively, 
and for that reason we have not repeated them (see 
[7] for a description of these wavelet bases) . For each 
wavelet, we have generated 64 series of 4096 points, 
which is a quite realistic size. Hence, we have gener- 
ated 24 ensembles of series and each wavelet is opti- 
mal in an ensemble. For a given ensemble, we have 
processed it with the same 24 wavelet bases. That 
is, for each ensemble we have tried its optimal basis 
and 23 non-optimal bases. We have hence performed 
24 x 24 = 576 different tests to check the validity of 
the theoretical results presented before. 

In Figure[4]we present the joint histograms of In \ctc | 
vs. ln|cvp|, obtained from the different ensembles 
when they are analyzed with the 24 bases, arranged 
in a tabular form. By construction, the histograms 
on the diagonal of this table correspond to the case 
in which the ensemble is analyzed with its optimal 
wavelet, and hence these histograms exhibit the same 
optimal behavior seen in Figure [3] bottom left. In 
contrast, when an ensemble is analyzed with a sub- 
optimal wavelet the histogram bends on the left to a 
horizontal line, as in Figure [3] bottom right. As the 
optimal and analyzing wavelets become more differ- 
ent, the amplitude of the term ao increases and hence 
the extension of the horizontal line in the joint his- 
togram becomes longer. 
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FIG. 4: Joint histograms of lnac vs. lnap for synthetically generated cascade process data. The histograms have 
been arranged as in Table I] i.e., generation wavelet (rows) vs. analysis wavelet (columns), so that the main diagonal 
corresponds to the optimal wavelet cases. Each histogram column has been normalized so that it corresponds to the 
probability distribution function of lnac conditioned to a given value of lnap in the horizontal axis. Values range —20 
to 1 in both axes, the same for all the histograms. The analyzed ensembles correspond to 64 series of 4096 points each 
and the generating cascade process is a log-Poisson of parameters Doo — 1 and = — §. Each histogram has 30 x 30 
boxes. 
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TABLE I: Summary of the Q (upper side of the cell) and / (lower side of the cell) optimality measures for synthetic cascade 
data. Each row corresponds to an ensemble generated with the wavelet written sideways at left (generation wavelet), 
while each column corresponds to the results obtained while analyzing these ensembles with the wavelet written at top 
(analysis wavelet). The ensembles correspond to 64 series of 4096 points each and the generating cascade process is a 
log-Poisson of parameters Doo — and = — |. Mutual information (/) is expressed in bits. Uncertainties of two 
sigmas are 0.0025 for Q and 0.03 bits for I. 
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In Table |T] we present the results of the mutual in- 
formation / between fj and a p , and the Q parameter 
as defined in eq. ( 39 1 for the different combinations of 



ensemble and analysis wavelet. As shown in the table, 
only when the processing wavelet coincides with the 
optimal wavelet the values of / and Q drop to and 
1, respectively, while for other, non-optimal wavelets 
these values are always higher. This proves that Q has 
the same performance as / to assess the optimality of 
a wavelet basis, but the Q parameter is less statisti- 
cally demanding. 

The Q parameter is obtained by means of the aver- 
age of f] and so, according to the Central Limit Theo- 
rem, it converges to its theoretical value with a stan- 
dard deviation that depends on the number of samples 
N as N~i , (T/\fj\\ — <7|jj| iV~2 (recall that the average 
in the denominator of Q, (\rj\), is theoretically fixed 
to A= due to translational invariance). a\fj\ depends 
on the wavelet and can be analytically calculated for 
the optimal case only, which in fact is the most in- 
teresting case as we want to have the error bar that 
discriminates optimal from non-optimal wavelets. For 
the distribution used here, log-Poisson with = 1 
l 



and h n 



2 , it is a\ v \ 



0.31, and 



so the standard deviation of Q goes as 0.62 AT- 2. As 
shown in [B] the estimation of the mutual information 
I has also a standard deviation depending on N~ s , 
but the proportionality constant is y/ui, which in our 
log-Poisson distribution is 5.66 bits. In addition, we 
do not take into account other sampling uncertainties 
stated in [5] that do not depend on N. The absolute 
uncertainty for / is 12 times that of Q, although their 
typical values are more than an order of magnitude 
smaller. For these reasons, we have analyzed relative 
large ensembles (64 series of 4096 points each) to show 
that Q performs equally well as I for large ensembles, 
but Q has the potential to be useful for smaller en- 
sembles. 

The results presented so far imply that the linear 
model is correct to describe the scaling relations in 
synthesized signals. We also know that this model 
is also correct for many wavelet basis on natural im- 
ages 0[THl[19l|2O] an d f° r satellite-derived chlorophyll 
maps [TU]. As a new example, we have processed the 
sequence of dye dispersed in 2D turbulence introduced 
in Section [TT] This sequence consists of 81 snapshots, 
each one recorded on a 512 x 512 gray-level image. 
The images were acquired with a CCD camera each 
0.5 seconds and were calibrated so that the intensity 
level of each pixel is proportional to the dye concentra- 
tion over that area. Some pre-processing is required in 
order to perform wavelet analysis of these data. First, 
data have a reduced dynamic range and acquisition 



noise is relevant, so we have reduced the resolution 
of images by averaging intensity values by blocks of 
4x 4 pixels. We hence obtain a sequence 128 x 128 
images with increased accuracy in the value at each 
pixel and increased signal-to-noise ratio. Second, it 
should be noticed that some areas of the image do 
not contain dye, specially during the early times of 
the sequence, and over those areas the wavelet coef- 
ficients vanish or take very small values. Henceforth, 
once the wavelet decomposition is obtained we have 
discarded those wavelet coefficients lying in areas at 
which dye concentration is negligible. The two stages 
of pre-processing considerably reduce the amount of 
available statistics, what is partially compensated by 
the fact of having 81 snapshots, although time corre- 
lation between snapshots is high and so the effective 
sampling size is moderate. 

We have applied the 2D separable surrogates of the 
24 wavelet bases used so far, and we have studied 
the optimality of each basis. 2D separable QMFs are 
formed by three different wavelet bases (also called 
"orientations", denoted by horizontal (h), vertical (v) 
and diagonal (d) details, [7]), each one defining its own 
cascade pyramid, in analogy to what was presented in 
[9]. Hence, we must study the pyramid defined by 
each orientation separately, what leads to define the 
parameters Qh, Q v and Qd, and analogously for the 
mutual informations. Results are summarized in Ta- 
ble El 

The results indicate that the behavior of each orien- 
tation are not necessarily related, although a wavelet, 
Daubechies 5, attains a good performance for all ori- 
entations: attending to Q parameter, this wavelet is 
the one closest to optimal for horizontal and vertical 
orientations, and is just 4.7% above the best one for 
diagonal orientation. In addition, the values of the 
three mutual informations are in the lowest range for 
this wavelet. Notice that the value of mutual informa- 
tions arc not very significant for any of the wavelets 
due to the limited sampling size. 
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TABLE II: Summary of the Q (upper side of the cell) and I (lower side of the cell) optimality measures for the dye 
dispersed in 2D turbulence sequence. Each row correspond to a different wavelet orientation, while each column corre- 
sponds to the results obtained while analyzing the sequence with the wavelet written at top (analysis wavelet). Mutual 
information (J) is expressed in bits. 
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VI. CONCLUSIONS 

In this paper we have discussed on the properties of 
optimal wavelet bases for the representation of mul- 
tiplicative cascades. With the aid of optimal wavelet 
bases, any given signal originated by a cascade can 
be explicitly represented in terms of that multiplica- 
tive cascade, s. When the wavelet basis used in the 
analysis is suboptimal, the local cascade variables are 
poorly described. We have shown that the multiplica- 
tive process is perturbed by the inclusion of an addi- 
tive, noise-like term, with an amplitude depending on 
the deviation from optimality of the studied basis. 

We have then proposed to quantify the degree of 
optimality of a given basis with a simple descriptor 
Q, defined as the ratio of the first order moment of 
estimated cascade variable by the first order moment 
of the actual cascade variable. As this quantity is ob- 
tained from first-order moments, it is not demanding 
in data, and as any deviation implies an increase in the 
first order moment of the estimated cascade variable, 
the optimal wavelet is an absolute minimum of this 
quantifier. Hence, Q can be used in any minimization 
strategy to derive the optimal wavelet. 

To exemplify the derivations on the behavior of Q, 
we have used 24 different standard wavelets to ana- 
lyze synthetic cascades generated with these same 24 
wavelets. Our experiences reveal that Q is more accu- 
rate than mutual informations in order to determine 
optimality in reduced datasets. In addition, as an ex- 
ample on real data, we have analyzed a sequence of 
dye dispersed in 2D turbulence and found that, among 
the 24 wavelets tested, Daubechies 5 is the closest to 
optimality. 

With the help of the theory settled in this paper, we 
can undertake a more ambitious program of research. 



A natural future research line consists on implement- 
ing a continuous optimization strategy based on the 
descriptor Q, so that we could derive optimal wavelets 
of given databases of real signals without restricting 
the search to given families of wavelet bases. For each 
system we could hence prove if there exists such an 
optimal wavelet basis and even if it does not exist, we 
will be able to derive the best one. This would im- 
prove our knowledge on the dynamics of the studied 
systems. Therefore, with the aid of optimal wavelets 
we could tackle problems such as data compression, 
forecast or inference of missing data, among others. 
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APPENDIX A: CONNECTION OF THE 
MICROCANONICAL CASCADE WITH THE 
MULTIFRACTAL SINGULARITY SPECTRUM 

A signal s is said to be multifractal in the micro- 
canonical sense |35j if an intensive functional e r acting 
on this signal (see Section [lH| can be characterized by 
local scaling relations of the type: 

e r (x) = a(x) r h ^ + o (r h ^A (Al) 

where the symbol o {r h ^>\ means a term that is neg- 
ligible in comparison with r h ( x \ The function that 
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comprises the local properties of changes in scale, 
h(x), is called the singularity exponent of the signal at 
the point x [29| 135] 
"multifractal" 



A signal verifying eq. (Al 



is said 



k — ► 0. As the singularity components Fh K are of 
fractal character, the distribution of singularity expo- 
nents at a given observation scale behaves as [36] : 



(in the microcanonical sense) because 
each value h of singularity exponent is associated to a 
singularity component = {x : h(x) — h} of fractal 
character, with Hausdorff dimension D(h). The func- 
tion D(h) is known as the singularity spectrum of the 
signal [3"6] , 

An interesting feature of the singularity spectrum 
is that although it is a geometrical feature of the mul- 
tifractal, it completely defines the statistical proper- 
ties of the cascade process. In fact, Parisi and Frisch 
[37] proved that the knowledge of D(h) granted the 
knowledge of the distribution of the cascade variables 
n through the knowledge of the multiscaling exponents 



p(h K ) 



K 



d-D(h K ) 



(A5) 



with, as stated, k — > 0. A direct obtaining of the D{h) 
is hence possible through: 



k^o In k 



where: 



h 



lim h K 



(A6) 



(A7) 



as expressed by eq. ([2]). In that case, r p is related Lemma: The singularity spectrum derived accord- 



to the singularity spectrum of the multifractal through 
a Legendre transform: 



= inf {ph 

h 



d-D(h)} 



(A2) 



which is known as the Parisi-Frisch formula and is the 
cornerstone of the canonical multifractal formalism. 



An interesting corollary of eq. ( A2 ) is that when D(h) 



is convex the Legendre transform can be inverted and 
hence D(h) can be expressed as the Legendre trans- 
form of the multiscaling exponents t p , namely: 



D L (h) 



inf {ph 

p 



(A3) 



The function Dl{K) is the so-called Legendre singular- 
ity spectrum, which is a convex function of h because 
Legendre transforms are always convex. If D(h) is 
convex, D(h) = Dl(K); if D(h) is not convex, Dl(K) 
will be its convex hull. 

There is a more direct approach to D{h) that can 
be used when the cascade variables are accessible and 
eliminates the necessity of imposing convex spectra 
D{h). This approach consists in calculating the limit 
as k — ► of the distribution of cascade singularity 
exponents. The cascade singularity exponents are de- 
fined as follows: 



K = log K ry K 



In rj K 
In k 



(A4) 



where n K is the multiplicative cascade variable that 
relates e r with cl, k = r/L, as in eq. The cascade 
singularity exponents represent the singularity expo- 
nents in the same sense of eq. ( |A1 | when they are ob- 
tained at the resolution level [35]. i.e.. when the scale 
ratio k is the one that compares the largest (whole- 
domain wide) scale L with the smallest (resolution- 
level) scale r, meaning that r << L or equivalently 



ing eq. (A6) coincides with the Legendre spectrum, 



eq. (A3), when the singularity spectrum is convex 



Proof: First, we define a random variable h K such 
that rj K = k Hk , i.e., 



h K = 



hiTfc 
In k 



(A8) 



As the cascade variable rj K is derived from a multi- 
fractal signal, the limit in eq. (A6) exists and it is 
d — D(h) (the Hausdorff spectrum of the signal) [35 . 
Therefore, the distribution of h K has a leading order 

^-D^) ^ foll0W g. 

p(K) = A K K d - D ^ + o{n d - D ^) (A9) 
for small values of n. Recalling here eq. Q we have: 

ln(^) 



t p = lim 

k^o in k 



We then expand it to find that: 
1 



(A10) 



lim In 

k— »o In k 



dh K K h * p p(h K ) 

lim -i- In ( [ dh K K h " p A K K d - D{h ^ 
k^o In k \J 



lim inf{/i K n + d 

k->0 h K 



D(h K )} 



= inf{hp + d- D(h)} 

h 



(All) 



where we used the saddle-point approximation. No- 
tice that eq. (All) is analogous to eq. (A2|. Recalling 



that the inverse of a Legendre transform on convex 
functions is another Legendre transform, if we obtain 
now the Legendre spectrum, eq. (A3 1, and assuming 
that D(h) is convex we conclude Djjh) = D(h), q.e.d. 



1G 



We will show now two examples of the lemma 
above, for two commonly used multiplicative pro- 
cesses, namely log-normal and log-Poisson processes. 
A log-normal process has the following distribution: 



1 



y/27 



Hence, the t p as defined in cq. (pi are given by: 



with 



w(h) 



1 h 



In (3d- D 



(A19) 



(A12) Let us now apply eq. (A6l. From equations 



(A4| and ( A16 1, the h K deviates from the most singu- 



lar exponent in an integer number n of contribu- 
tions log K [3, namely 



In re 2 In re 



(A13) 



Let h m = p K /\nn and a\ = — 2cr^/lnre (remember 



that re < 1), so eq. (A3) leads to the singularity spec- 
trum D{h): 



D{h) 



h — h Tl 



(A14) 



Let us show now that eq. ( A6 1 leads to the same ex- 



pression. Notice that eq. (A4) means that p(h K ) 
— In re p(ln rj K ) . Then, we s ut 



a i = ~ a h in e 1- ( A12 1 to obtain 



Dstitute fi K = h m In re and 



h\p{h K ) 
In re 



hi 



-In/ 



In K 



(A15) 



and the second term vanishes as re — > leading to 



eq. (A14I. It follows that eq. (A6| holds 



The log-Poisson case is a little bit more elaborated 
due to the discrete-to-continuous passage. A log- 
Poisson process is defined as rj K — n hoa (3 n with n being 
a Poisson variable of parameter A. Then the distribu- 
tion of In r] K is: 



p(mr? K ) = ' 



A™ 



<5(ln r] K — /ioo In re — n In /3) 



n=0 



(A16) 

which is discrete, i.e., it only takes nonzero values for 
some values of ln?y K . The parameter hoo is the sin- 
gularity exponent of the Most Singular Component 
(MSC) OUS], while the parameter A is related to the 
dimension of the MSC: A = (d — -Doo)(— hire) (both 
parentheses are always positive). It is also required 
that < j3 < 1. After some simple algebra, it is 
obtained that r p are given by: 

t p = phoo + (d - Axoa - p p ) (Ai7) 

and through eq. ( |A3[ ) the singularity spectrum is: 
D(h) = D oa + (d- Doe) u{h) (1 - \tluj(K)) (A18) 



n - 



ln/3 

h 

In re 

Ah K 



(A20) 



which give rise to a continuum of h in the limit 
(— hire) — > oo. Let us now define a convenient auxil- 
iary variable, u>(h K ), as 



u(h K ) = 



A 



1 



d - A» (-lrire) 
In (5 d — D 



(A21) 



Notice that u){h K ) is positive and proportional to Ah K . 
We now recall eq. ( A16 1 to obtain: 



kip(h K ) 
In re 



-X + nln A — Inn! 
In re 



(A22) 



Hence, according to eq. ( A6 1, the singularity spectrum 
is: 



„,,s , , , „ s , n In A — Inn! 

D(h) =d- (d-D oa )+ lim (A23) 

k^o — In re 



Where h = h K ^ as in eq. ( |A7| . For any h K different 
from hoo, i-e., Ah K ^ 0, when re goes to 0, n grows 
accordingly, because n is proportional to (—In re). So 
the limit re — > is equivalent to n — > co: 

„ , „ , n In A — nlnn + n — ln( v/%m) 
D(fc) = Doo + lim - ^ 1 

n— >co — In re 

(A24) 

where we have used the Stirling approximation to ex- 
pand n\. Recalling (—In re) = n ((d — -Doo) w (^k)) _1 
we have: 

= L>oo + (d- Doo) lim (In A -lnn+ l)w(h K ) 

n — >-oo 

(A25) 



which, as uj(h K ) = n/X, leads to eq. ( A18 1 
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APPENDIX B: CONVERGENCE OF THE 
ESTIMATES OF THE MUTUAL 
INFORMATION 

In this section we will calculate the standard devi- 
ation of the empirical estimates of the mutual infor- 
mation between two random variables. The mutual 
information between two variables X and Y is given 
by the following expression: 



I{X,Y) = H(X) + H(Y) — H(X,Y) 



(Bl) 



where H(X), H(Y) stands for the marginal entropies 
and H(X, Y) stands for the joint entropy. We will 
calculate the standard deviation on the estimate of 
H(X, Y), the other two being completely analogous. 
The ideal joint entropy is given by: 



(B2) 



We will express the entropy in nats instead of in bits 
for convenience in later calculations, so we use natural 
logarithms. 

Let us now suppose that we sample the state space 
with a histogram of B x x B y boxes. We will assume 
that the sampling is efficient, so each box contains 
only one p nm at most. Void boxes, in case they ex- 
ist, can directly be discarded as they do not change 
anything in the calculation, so without loss of gener- 
ality we can assume that the sampling is perfect, and 
each box contains one and only one weight p nm , so we 
can index boxes according to the weight index: B nm . 
Notice however that B x and B y are finite. We will 
first assume that they are large enough to make the 
contribution by the uncounted tails negligible. 

For a sample of N independent events we estimate 
Pnm w hh the following statistical: 



Pr, 



N 



(B3) 



where N nm is the number of events happening to lie in 
box B nm . The joint distribution of the variables N nm 
is a multinomial of order TV with B x x B y variables, 
each with probability p nm . If N is large enough we can 
disregard correlations and consider the distribution of 
each N nm as an independent binomial; notice however 
that this independence cannot be assumed when the 
weight estimates are summed up, as £ nm p„ m = 1. 
Notice also that correlations would increase our es- 
timate of uncertainty, so the value we are going to 
obtain should be considered a lower bound. Let us 
introduce a convenient representation for p nm : 



Pnm — Pnm ~t~ &Pn 



(B4) 



where we can express the deviation Sp nm in the fol- 
lowing way: 



Spn 



Pnm ( 1 Pnm) 



N 



Pnm 

IsT 



(B5) 

where the random variable e nm is standardized (i.e., 
it has zero mean and unit variance). 

The estimated joint entropy H (X, Y) is hence given 
by: 



(B6) 



Expanding this expression up to the first order in 
5pnm we have: 

H(X,Y) « H(X,Y) - ^ Pnm ln(l+ 5 ^) 

V Pnm J 

-^2Sp nrn lnp nm (B7) 
w H(X,Y) - N 2 2_^e nm pnm lnp nm 

n,m 

where we have used that J2 n ,m S P^m = S„, TO Pnm - 
mPnm = 1 — 1 = 0. We conclude that the devia- 
tion between the estimate of the joint entropy and its 
actual value is given by the following expression: 

5H(X,Y) = H(X,Y) - H(X,Y) 

W - N 2 ^ 6 nm Pnm In Pnm (B8) 



H(X 7 Y) converges to H(X, Y) when TV goes to infin- 
ity. Hence, in order to compute the standard deviation 
of H (X, Y) we just need to compute that of 8H{X 1 Y) . 
Now taking the variables e nm as independent (a first 
order approximation), we have: 



,.)//( v.r> = J^fe 



(B9) 



where e is a standardized variable and loxy is given 
by: 



uxy = ((lnjw) 2 ) 



(BIO) 



Analogous expressions arise for SH(X) and SH(Y), 
having their corresponding uox and uy respectively. 
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Hence, the deviation of the mutual information es- 
timate SI(X, Y) is given by the squared sum of the 
deviations of the joint and marginal entropies, with a 
global it) that is the sum of the joint u>xy and marginal 
lox and ujy ■ Thus we can estimate the minimum num- 
ber of samples Nq to attain a given accuracy level 
SI(X, Y) according to the following expression: 



N 



SI{X, Yy 



(Bll) 



The dependence on the square of the accuracy level 
makes entropy estimation very demanding in data. 
For instance, to attain an accuracy of 0.1 bits 0.07 
nats) we have iVo.i ~ 200 to; to attain an accu- 
racy of 0.01 bits we need a sample 100 times larger, 
iVo.oi « 2 ■ 10 4 w. For the case studied in section V 
4096-point series generated with log-Poisson distribu- 
tion of parameters = and = — i , the com- 
puted value of uj is around u> = 15.4 nats 2 . 

As a final remark, notice that we have made im- 
portant assumptions to derive this formula. The two 



most significant ones depend on the properties of the 
sampling using B x x B y boxes. First, we have assumed 
that we have properly sampled the histogram; second, 
we considered that the non-sampled tails do not sig- 
nificantly contribute to uncertainty. Concerning the 
first, we are assuming that the sample of the state 
space with B x x B y boxes is such that the associated 
weights {pnm} give an accurate idea of the mutual in- 
formation; for instance, if X and Y are independent 
then that with a good approximation p nm — p^Pm- 
Concerning the second, we need to assume that the 
excluded tails decay fast enough not to significantly 
alter the value of the entropies. These two contribu- 
tions will increase the dispersion 8H estimated here 
in a way that does not depend on N, so the mutual 
information will never be decreased below a certain 
level even if N goes to infinity. These sampling effects 
are absolutely depending on the distribution we are 
considering and hence no a priori bound can be given 
here. 
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